The process of bootstrapping makes use of resampling with replacement. Thankfully, R has a sample function that returns from a vector or list, a sample of a given size taken with or without replacement. We make use of sample to create a function to bootstrap the median of a vector of values.
b.median = function(data, num){
### Generate num resample from data --- one for each i = 1 to num
resamples <- lapply(1:num, function(i){sample(data, replace=TRUE)})
## Get the median of each of the resamples
r.median <- sapply(resamples, median)
### Compute the standard error of the median from the num medians
std.err <- sqrt(var(r.median))
### Return a list containing the std error , the resamples, and their medians
list(std.err = std.err, reamples = resamples, medians = r.median)
}
Having created a function to bootstrap the median, it is easy to create a function to bootstrap the mean.
### Create a copy of the b.median function called b.mean
b.mean <- b.median
We now edit the function by entering into the console one of
### Edit and replace
fix(b.mean)
### Edit and assign
b.mean <- edit(b.mean)
Note that we could have used edit to take care of both operations at the same time.
b.mean <- edit(b.median)
This approach is fast but prone to overwriting the “good” version.
The b.mean function should be modified by replacing occurences of “median” with “mean”. This results in
b.mean <- function(data, num){
### Generate num resample from data --- one for each i = 1 to num
resamples <- lapply(1:num, function(i){sample(data, replace=TRUE)})
## Get the median of each of the resamples
r.mean <- sapply(resamples, mean)
### Compute the standard error of the median from the num medians
std.err <- sqrt(var(r.mean))
### Return a list containing the std error , the resamples, and their medians
list(std.err = std.err, reamples = resamples, means = r.mean)
}
### Create 100 observations from an exponential with mean 10
data <- round(rexp(100, 1/10))
hist(data)
qqnorm(data)
qqline(data)
qqplot(data, rexp(500, 1/mean(data)))
qqline(data, distribution = function(p) qexp(p, 1/mean(data)),
prob = c(0.1, 0.6), col = 2)
### Check the first 10 observations
cat("First 10 obs of original sample: ", data[1:10], "\n")
## First 10 obs of original sample: 12 10 2 6 4 9 3 19 1 2
### Compute 10000 bootstrap samples for the mean
data.b.mean <- b.mean(data, 10000)
### Check the standard error
cat("se(xbar) = ", data.b.mean$std.err, "\n")
## se(xbar) = 0.9225718
### Look at the bootstrap distribution
hist(data.b.mean$means)
qqnorm(data.b.mean$means)
qqline(data.b.mean$means)
### Compute 10000 bootstrap samples for the median
data.b.median <- b.median(data, 10000)
### Check the standard error
cat("se(m) = ", data.b.median$std.err, "\n")
## se(m) = 0.8487821
### Look at the bootstrap distribution
hist(data.b.median$medians)
qqnorm(data.b.median$medians)
qqline(data.b.median$medians)
If we use the same “data” as before and repeat the bootstrapping, we get a different results because of different samples.
R has a boot package that makes it easy to implement bootstrapping on “complex” statistics.
We can use boot to replicate the bootstrapping that was carried out above.
p_load(boot)
### The boot.mean takes the original data (d) and computes the mean of the
### selected, indexed (i) values (d[i])
boot.mean <- function(d, i){mean(d[i])}
data.boot.mean <- boot(data, boot.mean, R=10000)
summary(data.boot.mean)
## Length Class Mode
## t0 1 -none- numeric
## t 10000 -none- numeric
## R 1 -none- numeric
## data 100 -none- numeric
## seed 626 -none- numeric
## statistic 1 -none- function
## sim 1 -none- character
## call 4 -none- call
## stype 1 -none- character
## strata 100 -none- numeric
## weights 100 -none- numeric
data.boot.mean
##
## ORDINARY NONPARAMETRIC BOOTSTRAP
##
##
## Call:
## boot(data = data, statistic = boot.mean, R = 10000)
##
##
## Bootstrap Statistics :
## original bias std. error
## t1* 9.62 -0.005246 0.9278368
hist(data.boot.mean$t)
qqnorm(data.boot.mean$t)
qqline(data.boot.mean$t)
boot.ci(data.boot.mean)
## Warning in boot.ci(data.boot.mean): bootstrap variances needed for studentized
## intervals
## BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
## Based on 10000 bootstrap replicates
##
## CALL :
## boot.ci(boot.out = data.boot.mean)
##
## Intervals :
## Level Normal Basic
## 95% ( 7.807, 11.444 ) ( 7.740, 11.380 )
##
## Level Percentile BCa
## 95% ( 7.86, 11.50 ) ( 7.98, 11.68 )
## Calculations and Intervals on Original Scale
Computing a confidence interval for the sample correlation can be challenging. Bootstrapping makes this relatively easy.
htwt <- read.csv("HtWt.csv")
head(htwt)
## Height Weight Group
## 1 64 159 1
## 2 63 155 2
## 3 67 157 2
## 4 60 125 1
## 5 52 103 2
## 6 58 122 2
### Create a function to compute the correlation of reseampled data
f <- function(d, i){corr(d[i,])}
boot.htwt.corr <- boot(htwt[,1:2], f, R=10000)
boot.htwt.corr
##
## ORDINARY NONPARAMETRIC BOOTSTRAP
##
##
## Call:
## boot(data = htwt[, 1:2], statistic = f, R = 10000)
##
##
## Bootstrap Statistics :
## original bias std. error
## t1* 0.9868154 -0.0002478202 0.005661326
qqnorm(boot.htwt.corr$t)
qqline(boot.htwt.corr$t)
boot.ci(boot.htwt.corr)
## Warning in boot.ci(boot.htwt.corr): bootstrap variances needed for studentized
## intervals
## BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
## Based on 10000 bootstrap replicates
##
## CALL :
## boot.ci(boot.out = boot.htwt.corr)
##
## Intervals :
## Level Normal Basic
## 95% ( 0.9760, 0.9982 ) ( 0.9788, 1.0003 )
##
## Level Percentile BCa
## 95% ( 0.9734, 0.9948 ) ( 0.9702, 0.9941 )
## Calculations and Intervals on Original Scale
We can bootstrap the slope and intercept of a simple regression model.
htwt <- read.csv("http://facweb1.redlands.edu/fac/jim_bentley/data/math 312/bootstrap/htwt.csv")
head(htwt)
## Height Weight Group
## 1 64 159 1
## 2 63 155 2
## 3 67 157 2
## 4 60 125 1
## 5 52 103 2
## 6 58 122 2
### Create a function to compute the correlation of reseampled data
f <- function(d, i){coef(lm(d$Weight[i] ~ d$Height[i]))}
boot.htwt.coef <- boot(htwt[,1:2], f, R=10000)
boot.htwt.coef
##
## ORDINARY NONPARAMETRIC BOOTSTRAP
##
##
## Call:
## boot(data = htwt[, 1:2], statistic = f, R = 10000)
##
##
## Bootstrap Statistics :
## original bias std. error
## t1* -173.459595 -0.588689709 12.6002354
## t2* 5.041217 0.009350654 0.2007861
qqnorm(boot.htwt.coef$t[,1])
qqline(boot.htwt.coef$t[,1])
qqnorm(boot.htwt.coef$t[,2])
qqline(boot.htwt.coef$t[,2])